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Abstract — Bit-interleaved coded modulation (BICM) is a prac- 
tical approach for reliable communication over the AWGN 
channel in the bandwidth limited regime. For a signal point 
constellation with 2™ points, BICM labels the signal points with 
bit strings of length m and then treats these m bits separately 
both at the transmitter and the receiver. BICM capacity is defined 
as the maximum of a certain achievable rate. Maximization 
has to be done over the probability mass functions (pmf) of 
the bits. This is a non-convex optimization problem. So far, 
the optimal bit pmfs were determined via exhaustive search, 
which is of exponential complexity in m. In this work, an 
algorithm called bit-alternating convex concave method (BACM) 
is developed. This algorithm calculates BICM capacity with 
a complexity that scales approximately as m^. The algorithm 
iteratively applies convex optimization techniques. Bacm is used 
to calculate BICM capacity of 4, 8, 16, 32, and 64-PAM in AWGN. 
For PAM constellations with more than 8 points, the presented 
values are the first results known in the literature. 

1. Introduction 

Bit-interleaved coded modulation (BICM) |[l|-||3) is a de 
facto standard for wireless communications, and it is used in 
e.g., HSPA, IEEE 802.11a/g/n, and the latest DVB standards 
(DVB-T2/S2/C2). 

In BICM, signal points from a finite constellation are 
labeled with bit strings. E.g., for 16-PAM, the signal points 
are labeled with log2 16 = 4 bits each. The bits in the labels 
are then treated independently both at the transmitter and 
the receiver According to |4|, to determine BICM capacity, 
a certain achievable rate has to be maximized over the bit 
probability mass functions (pmf). We will make this statement 
precise later in this work. This maximization is a non-convex 
optimization problem |5, Fig. 1]. So far, BICM capacity has 
been calculated using exhaustive search only. For the AWGN 
channel, results are presented for 8-PAM in |6 Fig. 3] and 
||5] Fig. 1] and for 16-QAM in [4, Fig. 2]. The complexity of 
exhaustive search is exponential in the number of bits in the 
labels, and calculating BICM capacity becomes an intractable 
problem for large constellations. This motivates the present 
work. 
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Our approach is as follows. We start by considering a 
discrete memoryless channel (DMC) operated by a BICM 
transceiver. To calculate BICM capacity, we develop a 
new algorithm called bit alternating convex-concave method 
(Bacm), which combines two optimization techniques: first, 
maximization is done sequentially over one bit pmf at a time, 
and second, the maximization over one bit pmf is done using 
the convex-concave procedure |7|. We then show how an 
average power constraint can be taken into account by BACM. 
This allows us to use BACM to calculate BICM capacity of 
PAM constellations in AWGN. We provide numerical results 
for 4 and 8-PAM and, for the first time in the literature, for 16, 
32, and 64-PAM. The results show that BICM capacity is close 
to AWGN capacity and significantly larger than what can be 
achieved by operating BICM with uniform bit pmfs. Finally, 
we argue that the complexity of Bacm scales approximately 
as m^ and logarithmically in the precision with which the 
optimal bit pmfs are calculated. An implementation of Bacm 
in Matlab is available on our website (S). 

II. System Model and Problem Statement 

Consider a DMC with 2" input symbols X = {1,..., 2™} 
and n output symbols y = {l,...,n}. The channel is 
specified by a matrix of transition probabilities H E R"^^ , 
where R denotes the set of real numbers. The input of the 
channel is the random variable X, which takes values in X 
according to the pmf p. The channel output is the random 
variable Y, which takes values in y according to the pmf 
r = Hp. 

A. DMC Capacity 

We denote the mutual information between X and Y either 

by 1{X;Y) or by I(p). The DMC capacity is [9, Eq. (7.1)] 



C = max ^p)■ 
p 



(1) 



The maximization is a convex optimization problem fTO] 
Prob. 4.57] and it can be solved by the Blahut-Arimoto 



algorithm 1 11 1, 1 12| or by a software package such as CVX 
1131. 



B. BICM Capacity 

In BICM, the input symbols are represented by their m-bit 
binary expansion, i.e. 



(2) 




2™o 1 



11. 



Each bit position of the channel input is treated independently 
both at the transmitter and the receiver, see |[3|, ||4| for details. 
This leads to the following constraint at the transmitter: 

• P, Eq. (8)]: The bits Bi in positions i of the channel 
input are stochastically independent, i.e., the channel 
input pmf p is given by 



P = P 



P 



(3) 



where p* is the pmf of Bi and where ig) denotes the 
Kronecker product, see p?) Def. 4.2.1], 



According to 1 15 Theorem 1], the following sum of mutual 



informations is an achievable rate for a BICM transceiver 

m 

4=1 



(4) 



Following P, Eq. (19)], the "BICM capacity" C*^'™^ is now 
given by 



= max 



irbicin 



(p\...,p™). 



(5) 



Unfortunately, the maximization is a non-convex problem. 
This will become clear in Sec. [Ill] 

C. Problem Statement 

So far, BICM capacity has been calculated in literature via 
exhaustive search ||4J-||6J. To determine the optimal bit pmfs 
with a precision of id, I "^'" has to be evaluated (^)™ times, 
so the complexity of this approach increases exponentially 
in the number of bit positions m and polynomially in the 
precision d. The objective of this work is to develop an 
algorithm that efficiently (compared to exhaustive search) 
calculates BICM capacity. 



III. Preliminary: 



AS A Function OFp' 



Trbicni 



The goal of this section is to characterize the objective '. 
as a function of one bit pmf p'. By this characterization, it 
will become clear that I "^™ is a non-convex function, and 
furthermore, we will see how we can maximize over p'. To 
this end, we pick an arbitrary bit position i and assume that 
for each j ^ i, Bj is distributed according to a fixed pmf and 
that Bi is distributed according to a pmf that we interpret as 
a variable. To emphasize this distinction, we denote the pmfs 



for j j^ihy p^ and the pmf of B, by p\ The function I*"'™ 
can now be written as 

m 

l'^--(p\ . . . ,p") = ^[H(y) - M{Y\B,)] (6) 

= mm{Y)-m{Y\B,)-"^m{Y\Bj). (7) 

We see that there are three kinds of terms that we need to 
express as functions of p*: the output entropy H(y), the 
conditional entropy H(F|i?i), and the conditional entropy 

B.{Y\Bj) for j ^ i. 

A. Output entropy as a function of p^ 
Define 



9o — P 



)p*~i«)(l 0)^«)p*+i(g)---(g)p'" (8) 



q-i :=p^ (g)---«)p*"^ «) (0 l)'^®p*+^(g) •••(g)p'". (9) 
The channel seen by the ith bit is now given by 

H'=H{ql q\)£W''^. (10) 

The output pmf can now be written as 

r = Hp = Wp\ (11) 

Thus, the output entropy as a function of p' is given by 



H(y) = -^rfelogrfc (12) 

k = l 

n 

= -^(r)fclog(r)fc (13) 

fe=i 

n 

= -5^{IfV)fclog(/fV)fc (14) 



fe=i 



where (a;)^ denotes the kth entry of the vector x. Since 
—X log X is concave in x, we conclude that the output entropy 
is concave in p*. 

B. Conditional entropy M(Y\Bi) as a function of p^ 

The output entropy conditioned on the ith bit can be written 

as 

1 n 

m{Y\B,) ^ -Y,plY,{WU\og{WU (15) 



6=0 fc=l 



where we index the rows of H^ by 1 , . . . , fc and the columns 
by the binary values 0, 1, e.g., (iJ*)io is the entry of i?' 
in the first row and first column. We conclude from ( [T5| l that 
B.{Y\Bi) [and thereby -'E.{Y\Bi), which contributes to the 
objective function] is linear in p*. 



C. Conditional entropy M.{Y\Bj) as a function of p^ 
Define 
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.p". (19) 



Now, the channel seen by the jth and the ith bit is given by 



The channel seen by the jth bit can be written as 

W = W 
Thus, the output entropy conditioned on the jth is 



p' 



(20) 



(21) 



p , . . . , p™ <— starting point 
repeat bit alternation, outer loop 

for i — 1, . . . ,m bit alternation, inner loop 
maximize I "^™ over p' see Alg. |2| 
update p* with the maximizing p* 
end for 
until convergence 



Algorithm 2.(convex-concave procedure) 

calculate if and i?-'*, j ^ i 

p* ^— p* 

repeat 

1 . p' -;— p* 

2. p* <— argmax /'(p\p*) iee Subsec. 

p' 
until convergence 



/V-B 



H(y|B,) 



b=0 fc=l 
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(22) 


) P^J 


log 
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fp' 


oV 

p7. 


kb 

(23) 



Since — xlogx is concave in x, we conclude that H(F|i3j) is 
concave in p*. As a consequence, the term — M{Y\Bj), which 
contributes to the objective function, is convex in p*. 

D. Summary 

The objective function as a function of p* can be charac- 
terized as follows: 

I^--(p\...,p*-\p\p*+\...,p") 

= TOH(y) -M{Y\Bi) + J2[-HY\Bj)]. (24) 

linear in p^ 



• We maximize over one bit pmf p' at a time and then 
cycle through the i = 1, . . . ,to until convergence. This 
approach goes under the name alternating maximization. 

• To maximize over one bit pmf p\ we iteratively approx- 
imate I "^™ by a lower bound that is concave in p* and 
maximize this concave lower bound. After convergence, 
the maximum of the concave lower bound is also a local 
maximum of I "^™ as a function of p*. This technique is 
known as the convex-concave procedure |7]. 

We call this approach the bit-alternating convex-concave 
method (Bacm). The alternating maximization over the bit 
pmfs is displayed in Alg. [T] The maximization over one bit 
pmf is detailed next. 



concave in p' 



ji^i 



convex in p^ 



As a sum of convex and concave terms, I "^™ is a non-convex 
function. However, as we detail in the next section, the convex- 
concave procedure ||7J can be applied to maximize I "^™ over 
p\ 



IV. Bacm Algorithm 



The objective 



Trbicni 



is a non-convex function of the pmfs 



p , . . . , p™ with potentially more than one local maximum. 
Thus, finding an efficient algorithm that provably finds the 
global maximum is difficult. Therefore, we resort to the 
simpler problem of finding a local maximum. With a good 
starting point, the global maximum is nevertheless found by 
such an approach. To find local maxima, efficient methods are 
available. For the problem at hand, we choose the combination 
of two methods. 



A. Concave Lower Bound 

As the objective is the sum of concave and convex functions, 
it cannot be maximized directly. However, the convex-concave 
procedure as defined in |16 slide 26] can be applied. Define 
the function h^{p^) as the negative of the right-hand side 
of ( |23] l. This function is convex in p*. The convex-concave 
procedure is an iterative method and works as follows. Denote 
by p* the result for p* in the previous step. Then, in the current 
step, approximate hP {p"^) by its first order Taylor expansion in 
p*, i.e., by 

y{p\p') := h^if) + Vh^pYip' - p'). (25) 

Note that since h^ (p* ) is convex in p* and the approximation 
h^{p^,p^) is linear in p*, the approximation h^{p\p^) lower 
bounds h^{p^) for any value of p*. By a calculation similar 
to 1 17 (7.61)-(7.63)] it can be shown that h^ is given by 



y{p\pi = 

1 n 

ErfE 

b=0 k=l 
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log 



kb 



W 





p' 



kb 



(26) 



Putting all together, we have a concave lower bound of I "^™ 
as a function of p* given by 

f{p\p') ■.= mm{Y)-M{Y\B,) + Y,h'iP\Pl (27) 

n In 



E.g., (iJ-'*)iio denotes the entry of H^"^ in the 1st row and 
the 3rd column. For notational convenience, we write 

dpM.P') 



k=l 

1 



6=0 k = l 



j^i 6=0 fe=l 
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log 



kb 



w 



p" 
p* 



kh 



(28) 



Since p is a concave function of p% it can be maximized 
efficiently over p*, as we will explain in detail in the next 
subsection. We iteratively update p' with the value of p* that 
maximizes /*(p*,p*). Algorithm illustrates this procedure. 
After convergence, the pmf p* locally maximizes I "^™ over 
p* given the fixed pmfs fp for j ^ i. 

B. Solving the Inner Optimization Problem 
We need to solve the optimization problem 

maximize /*(p*,p*). (29) 

pmf p* 

Any pmf p' can for some p^ E [0, 1] be written as p' = 



Pa 
1-Po 



We define 



roipip') = ri 



Po 
1-Po 



.p'). 



(30) 



(31) 



We can now formulate our optimization problem as 

maximize /o(po,p'). 

Note that the problems ( |29] l and ( (3T| ) are equivalent and 
furthermore, by [ilO. Sec. 3.2.2], /q is a concave function of 
Po- Thus, our problem reduces to finding the maximum of a 
concave function with a scalar argument. This can be done as 
follows. 

The first derivative of H{Y), H{Y\B,), and 

. L17P0 

given by 

dH{Y) 
dpi 



jP*)' i 7^ * with respect to p\^ are respectively 



n 



V[i.^(i -i)^j 



\og{W 



Po 
1-Po 



(32) 



^^gp^'^ = E [(^')fcO log(Jf^)fcO - (ff')fel l0g(lf ^)fci] 



fc=i 



(33) 



5/i^( 



Po 
1-Po 



,p' 



9pj) 



= ^p^,5;][(jf-'-')fe6o-(/f^"').6i] 



6=0 fc=l 



log(fl-^")fc6 (34) 



where we index the rows of H^^ by k ~ 1, . . . ,n and the 
columns by the binary expansion bjbi — 00,01,10,11. 



dfo{Po,p") 



dpi 



(35) 



Putting the expressions above together according to ( [27| i, 
we get the first derivative of /q. Since /q is concave, dfl 
is monotonically decreasing in p^. Consequently, we can 
maximize /g over p^ e [0, 1] as follows. 



argmax/^(p^,p* 

Po 



d/^(0+,p*)<0 

1 d/^(i-,pO>o 

^Po '■ dfi{pl,p') = otherwise. 

(36) 

In our implementation |8 1, we use the bisection method to find 
pI in the third case. See Sec. 



VII 



for details. 



V. Adding an average cost constraint 

We discuss how Bacm can be used to calculate BICM 
capacity when the bit pmfs are subject to an average cost 
constraint. Suppose we have a cost vector w e R>o^ where 
R>o denotes the set of positive real numbers. Then, the 
symbol costs seen by the ith bit are given by 

q{)f- (37) 



[w'^lql 



The average cost can now be included by adding a weighted 

■T ■ 

version of the average cost w^ p' to /*, i.e., the inner 



optimization problem in Alg. [2] now becomes 



maximize [f (p* , p* 



Xw'^p\ 



(38) 



This simply adds another linear term and our algorithm works 
in exactly the same way as before. Denote by p'* the optimal 
pmfs found by this modified version of BACM for some A. 
Consider the resulting cost 







E ^ w'^p* 




(39) 


where p* 


= pi*(8)--- 


(8)p"'*. Then, 


it can be shown that the 


bit pmfs 


pi*,...,p"* 


solve the optimization problem 






niaximize 
pi,...,p"^ 

subject to 


l'^'™(pi,.. 
w'^ip^ (g) • • 


®p"')<E. 


(40) 



VI. Application to PAM in AWGN 

We use Bacm to calculate BICM capacity of PAM con- 
stellations in AWGN. To calculate the BICM capacity of 
PAM constellations in AWGN, optimization has to be done 
over the labeling of the signal points, the scaling of the 
constellation, and the bit pmfs, see |j6] Eq. (40)] for details. 
Here, we fix the labeling to the binary reflected Gray code 
||6] Sec. II-B] and optimize over constellation scaling and bit 
pmfs. To be able to use Bacm, we discretize the channel 
output into 200 equally spaced points. For each scaling, the 
discretized AWGN channel with M = 2™ constellation points 
at the input can thus be represented by a DMC specified 
by a transition matrix H e r200xm p^j. ^j^j^ dmc, we 



o 



.c. 



AWGN capacity: 0.51og(l + snr) 
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Fig. 1. Results for 4, 8, 16, 32, and 64-PAM in AWGN. In the horizontal 
direction, SNR is displayed in dB. In the vertical direction, we show the 
gap in percent to the AWGN capacity C(snr) = 0.51og(l + snr). E.g., 



for BICM capacity, the gap is calculated as 100 ■ (1 



C(5"r) 



■). For 



Cbicm(snr) > 

each constellation size and a corresponding target SNR, CM capacity, BICM 
capacity, and uniform BICM capacity are displayed. For BICM capacity, we 
display several values since we could adjust the effective SNR only via the 
weighting factor A, see Sec. Iv] 

use the method proposed in Sec. IV] to calculate the BICM 
capacity. To achieve a target SNR, we iteratively adapt the 
weighting A of the average power in ( |38| l. We repeat this 
for different constellation scalings and choose the scaling 
that yields the largest value for I "^'". This largest value is 
the BICM capacity and we denote it by C'^"^'"(snr). Results 
for 4, 8, 16, 32, and 64-PAM are displayed in Fig. [T] For 
comparison, coded modulation (CM) capacity |6, Eq. (28)] 
of the corresponding constellation and I "^'" for uniform bit 
pmfs are displayed. The values for CM capacity were obtained 
via CVX fTT|. The BICM capacity significantly outperforms 
uniform BICM and gets close to CM capacity. We calculated 
the optimal bit pmfs with a precision of d = lO^'^. 

VII. Complexity of Bacm 

We start by analyzing the complexity of the inner optimiza- 
tion problem. To cover the first two cases in ([36|, we need to 
evaluate df^ two times. To find the pg in the third case we use 
the bisection method starting with the upper bound u = 1 and 
the lower bound ^ = 0, and we terminate when u — i < 2d. 
After termination, we assign p^ ~ ^^ . Thus, we calculate Pq 



with a precision of ±d. According to 1 10 p. 146], the number 



of times we need to evaluate d/g until termination is given by 



l0g2 



2d 



log^ 



1-0 
2d 



log2 



1 
2d 



(41) 



When evaluating df^, by ( |27| ), we need to evaluate dh^ /dpi, 
for each j ^ i, which results in a number of m — 1 or 
roughly m evaluations. Overall, the number of evaluations 
needed for solving the inner optimization problem once is 
roughly m logj ^ . The sizes of the matrices involved in (|28 
are invariant under m, i.e., i?-'* e R"^"* and H^ E R 



Therefore, the number of iterations until convergence in Alg.l2] 
should be approximately invariant under m and we denote it 
by a constant K. For our AWGN simulations, this number was 
around K ~ 3, independent of m. The complexity of maximiz- 
ing I '^™ over one bit pmf is thus approximately Km logj j^. 
This maximization has to be done for i — 1,. . . ,m, i.e., m 
times, which adds another factor of m to the complexity. This 
procedure has to be repeated L times until convergence in 
the outer loop of Alg. [T] This number depends on m. For 
the AWGN simulations, we observed for m ^ 2,3,4,5,6, 
respectively, the values 



2.00 3.27 3.90 4.24 



4.31. 



(42) 



The average for each m is taken separately over all values 
that were observed when executing Bacm. This value in- 
creases slightly with m. To have a rough bound on com- 
plexity, we assume that L increases at most linearly with to, 
which is consistent with the observed data ( |42] i. All together, 
we have a complexity that is approximately of the order 
LKm? \og2-^ < Km?\og2^- In summary, BACM scales 
as m^ and logarithmically in the precision d. 
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